1,678 research outputs found

    Performance comparison between Java and JNI for optimal implementation of computational micro-kernels

    Get PDF
    General purpose CPUs used in high performance computing (HPC) support a vector instruction set and an out-of-order engine dedicated to increase the instruction level parallelism. Hence, related optimizations are currently critical to improve the performance of applications requiring numerical computation. Moreover, the use of a Java run-time environment such as the HotSpot Java Virtual Machine (JVM) in high performance computing is a promising alternative. It benefits from its programming flexibility, productivity and the performance is ensured by the Just-In-Time (JIT) compiler. Though, the JIT compiler suffers from two main drawbacks. First, the JIT is a black box for developers. We have no control over the generated code nor any feedback from its optimization phases like vectorization. Secondly, the time constraint narrows down the degree of optimization compared to static compilers like GCC or LLVM. So, it is compelling to use statically compiled code since it benefits from additional optimization reducing performance bottlenecks. Java enables to call native code from dynamic libraries through the Java Native Interface (JNI). Nevertheless, JNI methods are not inlined and require an additional cost to be invoked compared to Java ones. Therefore, to benefit from better static optimization, this call overhead must be leveraged by the amount of computation performed at each JNI invocation. In this paper we tackle this problem and we propose to do this analysis for a set of micro-kernels. Our goal is to select the most efficient implementation considering the amount of computation defined by the calling context. We also investigate the impact on performance of several different optimization schemes which are vectorization, out-of-order optimization, data alignment, method inlining and the use of native memory for JNI methods.Comment: Part of ADAPT Workshop proceedings, 2015 (arXiv:1412.2347

    Is dynamic compilation possible for embedded systems ?

    Get PDF
    International audienceJIT compilation and dynamic compilation are powerful techniques allowing to delay the final code generation to the run-time. There is many benefits : improved portability, virtual machine security, etc. Unforturnately the tools used for JIT compilation and dynamic compilation does not met the classical requirement for embedded platforms: memory size is huge and code generation has big overheads. In this paper we show how dynamic code specialization (JIT) can be used and be beneficial in terms of execution speed and energy consumption with memory footprint kept under control. We based our approaches on our tool de-Goal and on LLVM, that we extended to be able to produce lightweight runtime specializers from annotated LLVM programs. Benchmarks are manipulated and transformed into templates and a specialization routine is build to instantiate the routines. Such approach allows to produce efficient special-izations routines, with a minimal energy consumption and memory footprint compare to a generic JIT application. Through some benchmarks, we present its efficiency in terms of speed, energy and memory footprint. We show that over static compilation we can achieve a speed-up of 21 % in terms of execution speed but also a 10 % energy reduction with a moderate memory footprint

    Self-optimisation using runtime code generation for wireless sensor networks

    Get PDF
    International audienceThis paper addresses the use of runtime code specialisation in resource-constrained embedded systems such as nodes of a Wireless Sensor Network (WSN), in order to improve software efficiency, hence the lifetime of WSN nodes. In our approach, runtime code specialisation is achieved with in-place runtime code generation. We present a self-optimising system using runtime code generation. Our system is able to automatically make the decision to generate specialised code and use it each time an improvement is observed in application performance. In the Internet of Things (IoT), devices usually have limited precision; our system adapts to theses devices decreasing precision in order to increase performance. We evaluate our system on floating point multiplication using the WisMote platform, where the specialised code executes more than 7 times faster than generic code, all overheads included. To the best of our knowledge, it is the first time that a runtime code generation system is used to automatically optimise code in such constrained devices as WSN nodes

    Involvement of small-scale dairy farms in an industrial supply chain: When production standards meet farm diversity

    Get PDF
    In certain contexts, dairy firms are supplied by small-scale family farms. Firms provide a set of technical and economic recommendations meant to help farmers meet their requirements in terms of the quantity and quality of milk collected. This study analyzes how such recommendations may be adopted by studying six farms in Brazil. All farms are beneficiaries of the country's agrarian reforms, but they differ in terms of how they developed their activities, their resources and their milk collection objectives. First, we built a technical and economic benchmark farm based on recommendations from a dairy firm and farmer advisory institutions. Our analysis of the farms' practices and technical and economic results show that none of the farms in the sample apply all of the benchmark recommendations; however, all farms specialized in dairy production observe the main underlying principles with regard to feeding systems and breeding. The decisive factors in whether the benchmark is adopted and successfully implemented are (i) access to the supply chain when a farmer establishes his activity, (ii) a grasp of reproduction and forage production techniques and (iii) an understanding of dairy cattle feed dietary rationing principles. The technical problems observed in some cases impact the farms' dairy performance and cash position; this can lead to a process of disinvestment. This dynamic of farms facing production standards suggests that the diversity of specialized livestock farmers should be taken into account more effectively through advisory approaches that combine basic zootechnical training with assistance in planning farm activities over the short and medium term. (Résumé d'auteur

    Compilation for heterogeneous SoCs : bridging the gap between software and target-specific mechanisms

    Get PDF
    International audienceCurrent applications constraints are pushing for higher computation power while reducing energy consumption, driving the development of increasingly specialized socs. In the mean time, these socs are still programmed in assembly language to make use of their specific hardware mechanisms. The constraints on hardware development bringing specialization, hence heterogeneity, it is essential to support these new mechanisms using high-level programming. In this work, we use a parametric data flow formalism to abstract the application from any hardware platform. From this premise, we propose to contribute to the compilation of target independent programs on heterogeneous platforms. These developments are threefold, with 1) the support of hardware accelerators for computation using actor fusion, 2) the automatic generation of communications on complex memory layouts and 3) the synchronization of distributed cores using hardware mechanisms for scheduling. The code generation is illustrated on a telecommunication dedicated heterogeneous soc

    Contrôle d'application flot de données pour les systèmes sur puces : étude de cas sur la plateforme Magali

    Get PDF
    International audienceLes applications embarquées demandent toujours plus de puissance de calcul pour moins de consommation, avec comme conséquence l'apparition de systèmes sur puces dédiés. Dans le domaine du traitement du signal, le modèle de calcul flot de données est couramment utilisé pour la programmation de ces systèmes sur puce. Il est donc nécessaire d'avoir un modèle d'exécution adapté à ces architectures et répondant aux contraintes applicatives. Dans ce tra- vail, nous proposons un nouveau modèle d'exécution pour le contrôle d'applications flot de données. Notre approche s'appuie sur les liens entre les caractéristiques des applications et les performances selon le modèle d'exécution associé. Ce travail est illustré avec une étude de cas sur la plateforme Magali

    Cognitive Radio Programming: Existing Solutions and Open Issues

    Get PDF
    Software defined radio (sdr) technology has evolved rapidly and is now reaching market maturity, providing solutions for cognitive radio applications. Still, a lot of issues have yet to be studied. In this paper, we highlight the constraints imposed by recent radio protocols and we present current architectures and solutions for programming sdr. We also list the challenges to overcome in order to reach mastery of future cognitive radios systems.La radio logicielle a évolué rapidement pour atteindre la maturité nécessaire pour être mise sur le marché, offrant de nouvelles solutions pour les applications de radio cognitive. Cependant, beaucoup de problèmes restent à étudier. Dans ce papier, nous présentons les contraintes imposées par les nouveaux protocoles radios, les architectures matérielles existantes ainsi que les solutions pour les programmer. De plus, nous listons les difficultés à surmonter pour maitriser les futurs systèmes de radio cognitive

    deGoal a tool to embed dynamic code generators into applications

    Get PDF
    International audienceThe processing applications that are now being used in mo- bile and embedded platforms require at the same time a fair amount of processing power and a high level of flexibility, due to the nature of the data to process. In this context we propose a lightweight code genera- tion technique that is able to perform data dependent optimizations at run-time for processing kernels. In this paper we present the motivations and how to use deGoal a tool designed to build fast and portable binary code generators called com- pilettes

    Code Generation for an Application-Specific VLIW Processor With Clustered, Addressable Register Files

    Get PDF
    International audienceModern compilers integrate recent advances in compiler construction, intermediate representations, algorithms and programming language front-ends. Yet code generation for appli\-cation-specific architectures benefits only marginally from this trend, as most of the effort is oriented towards popular general-purpose architectures. Historically, non-orthogonal architectures have relied on custom compiler technologies, some retargettable, but largely decoupled from the evolution of mainstream tool flows. Very Long Instruction Word (VLIW) architectures have introduced a variety of interesting problems such as clusterization, packetization or bundling, instruction scheduling for exposed pipelines, long delay slots, software pipelining, etc. These have been addressed in the literature, with a focus on the exploitation of Instruction Level Parallelism (ILP). While these are well known solutions already embedded into existing compilers, they rely on common hardware functionalities that are expected to be present in a fairly large subset of VLIW architectures. This paper presents our work on back-end compiler for Mephisto, a high performance low-power application-specific processor, based on LLVM. Mephisto is specialized enough to challenge established code generation solutions for VLIW and DSP processors, calling for an innovative compilation flow. Conversely, even though Mephisto might be seen a somewhat exotic processor, its hardware characteristics such as addressable register files benefit from existing analyses and transformations in LLVM. We describe our model of the Mephisto architecture, the difficulties we encountered, and the associated compilation methods, some of them new and specific to Mephisto
    corecore